Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables

نویسندگان

  • Ilias Diakonikolas
  • Daniel M. Kane
  • Alistair Stewart
چکیده

We study the structure and learnability of sums of independent integer random variables (SIIRVs). For k ∈ Z+, a k-SIIRV of order n ∈ Z+ is the probability distribution of the sum of n mutually independent random variables each supported on {0, 1, . . . , k − 1}. We denote by Sn,k the set of all k-SIIRVs of order n. How many samples are required to learn an arbitrary distribution in Sn,k? In this paper, we tightly characterize the sample and computational complexity of this problem. More precisely, we design a computationally efficient algorithm that uses Õ(k/ǫ) samples, and learns an arbitrary k-SIIRV within error ǫ, in total variation distance. Moreover, we show that the optimal sample complexity of this learning problem is Θ((k/ǫ) √ log(1/ǫ)), i.e., we prove an upper bound and a matching information-theoretic lower bound. Our algorithm proceeds by learning the Fourier transform of the target k-SIIRV in its effective support. Its correctness relies on the approximate sparsity of the Fourier transform of k-SIIRVs – a structural property that we establish, roughly stating that the Fourier transform of k-SIIRVs has small magnitude outside a small set. Along the way we prove several new structural results about k-SIIRVs. As one of our main structural contributions, we give an efficient algorithm to construct a sparse proper ǫ-cover for Sn,k, in total variation distance. We also obtain a novel geometric characterization of the space of k-SIIRVs. Our characterization allows us to prove a tight lower bound on the size of ǫ-covers for Sn,k – establishing that our cover upper bound is optimal – and is the key ingredient in our tight sample complexity lower bound. Our approach of exploiting the sparsity of the Fourier transform in distribution learning is general, and has recently found additional applications. In a subsequent work [DKS15a], we use a generalization of this idea (in higher dimensions) to obtain the first efficient learning algorithm for Poisson multinomial distributions. In [DKS15b], we build on this approach to obtain the fastest known proper learning algorithm for Poisson binomial distributions (2-SIIRVs). Supported by EPSRC grant EP/L021749/1 and a Marie Curie Career Integration grant. Some of this work was performed while visiting the University of Edinburgh. Supported by EPSRC grant EP/L021749/1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asymptotic Behavior of Weighted Sums of Weakly Negative Dependent Random Variables

Let be a sequence of weakly negative dependent (denoted by, WND) random variables with common distribution function F and let be other sequence of positive random variables independent of and for some and for all . In this paper, we study the asymptotic behavior of the tail probabilities of the maximum, weighted sums, randomly weighted sums and randomly indexed weighted sums of heavy...

متن کامل

Nearly Optimal Learning and Sparse Covers for Sums of Independent Integer Random Variables

For k ∈ Z+, a k-SIIRV of order n ∈ Z+ is the discrete probability distribution of the sum of n mutually independent random variables each supported on {0, 1, . . . , k− 1}. We denote by Sn,k the set of all k-SIIRV’s of order n. In this paper we prove two main results: • We give a near-sample optimal and computationally efficient algorithm for learning kSIIRVs from independent samples under the ...

متن کامل

Fourier-Based Testing for Families of Distributions

We study the general problem of testing whether an unknown discrete distribution belongs to a given family of distributions. More specifically, given a class of distributions P and sample access to an unknown distribution P, we want to distinguish (with high probability) between the case that P ∈ P and the case that P is ǫ-far, in total variation distance, from every distribution in P . This is...

متن کامل

The relationship between Fourier and Mellin transforms, with applications to probability

The use of Fourier transforms for deriving probability densities of sums and differences of random variables is well known. The use of Mellin transforms to derive densities for products and quotients of random variables is less well known. We present the relationship between the Fourier and Mellin transform, and discuss the use of these transforms in deriving densities for algebraic combination...

متن کامل

On the bounds in Poisson approximation for independent geometric distributed random variables

‎The main purpose of this note is to establish some bounds in Poisson approximation for row-wise arrays of independent geometric distributed random variables using the operator method‎. ‎Some results related to random sums of independent geometric distributed random variables are also investigated.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016